NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Transformers are Efficient Compilers, Provably

Zhai, X; Zhou, R; Zhang, L; Du, S (December 2025, Conference on Language Modeling (COLM) 2025)

Full Text Available
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts

Li, J; Yang, Y; Yang, S; Zhang, L; Wu, YN (September 2025, ACL (Association for Computational Linguistics))

Full Text Available
MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment

Wang, T; Gui, D; Hu, Y; Lin, S; Zhang, L (July 2025, Proceedings of Machine Learning Research)

Reinforcement Learning from Human Feedback (RLHF) has shown promise in aligning large language models (LLMs). Yet its reliance on a singular reward model often overlooks the diversity of human preferences. Recent approaches address this limitation by leveraging multi-dimensional feedback to fine-tune corresponding reward models and train LLMs using reinforcement learning. However, the process is costly and unstable, especially given the competing and heterogeneous nature of human preferences. In this paper, we propose Mixing Preference Optimization (MPO), a post-processing framework for aggregating single-objective policies as an alternative to both multi-objective RLHF (MORLHF) and MaxMin-RLHF. MPO avoids alignment from scratch. Instead, it log-linearly combines existing policies into a unified one with the weight of each policy computed via a batch stochastic mirror descent. Empirical results demonstrate that MPO achieves balanced performance across diverse preferences, outperforming or matching existing models with significantly reduced computational costs.
more » « less
Full Text Available
Frictional properties of the Pelona--Orocopia--Rand schists in hydrothermal conditions and implications for seismicity in Southern California

Guvercin, SE; Barbot, S; Zhang, L; Yang, Z; Platt, J; Seyler, C; Phillips, N (August 2025, Earth and planetary science letters)
Avouac, J-P (Ed.)
The Pelona–Orocopia–Rand (POR) schists were emplaced during the Farallon flat subduction in the early Cenozoic and now occupy the root of major strike-slip faults of the San Andreas Fault system. The POR schists are considered frictionally stable at lower temperatures than other basement rocks, limiting the maximum depth of seismicity in Southern California. However, experimental constraints on the composition and frictional properties of POR schists are still missing. Here, we study the frictional behavior of synthetic gouge derived from Pelona, Portal, and Rand Mountain schist wall rocks under hydrothermal, triaxial conditions. We conduct velocity-step experiments from 0.04 to 1 μm/s from room temperature to 500ºC under 200 MPa effective normal stress, including a 30 MPa porefluid pressure. The frictional stability of POR schists in the lower crust is caused by a thermally activated transition from slip-rate- and state-dependent friction to inherently stable, rate-dependent creep between 300ºC and 500ºC, depending on sample composition and slip-rate. The mineralogy of POR schists shows much variability caused by different protoliths and metamorphic grades, featuring various amounts of phyllosilicates, quartz, feldspar, and amphibole. Pelona and Portal schists exhibit a velocity-weakening regime enabling the nucleation and propagation of earthquakes when exhumed in the middle crust, as in the Mojave section of the San Andreas Fault. The contrasted frictional properties of POR schists exemplify the lithological control of seismic processes and associated hazards.
more » « less
Full Text Available
FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees

Nie, F; Hou, X; Lin, S; Zou, J; Yao, H; Zhang, L (July 2025, Proceedings of Machine Learning Research)

The propensity of large language models (LLMs) to generate hallucinations and non-factual content undermines their reliability in high-stakes domains, where rigorous control over Type I errors (the conditional probability of incorrectly classifying hallucinations as truthful content) is essential. Despite its importance, formal verification of LLM factuality with such guarantees remains largely unexplored. In this paper, we introduce FACTTEST, a novel framework that statistically assesses whether an LLM can provide correct answers to given questions with high-probability correctness guarantees. We formulate hallucina- tion detection as a hypothesis testing problem to enforce an upper bound of Type I errors at user-specified significance levels. Notably, we prove that FACTTEST also ensures strong Type II error control under mild conditions and can be extended to maintain its effectiveness when covariate shifts exist. FACTTEST is distribution-free and and model-agnostic. It works for any number of human-annotated samples and applies to any black-box or white-box LM. Extensive experiments demonstrate that FACTTEST effectively detects hallucinations and enable LLMs to abstain from answering unknown questions, leading to an over 40% accuracy improvement.
more » « less
Full Text Available
ROSERAG: Robust Retrieval-augmented Generation with Small-scale LLMs via Margin-aware Preference Optimization

Liu, T; Jiang, H; Wang, T; Xu, R; Yu, Y; Zhang, L; Zhao, T; Wang, H (July 2025, Proceedings of Machine Learning Research)

Large language models (LLMs) have achieved impressive performance but face high computational costs and latency, limiting their deployment in resource-constrained settings. In contrast, small-scale LLMs (SLMs) are more efficient yet struggle to capture evolving real-world knowledge. Retrieval-augmented generation (RAG) helps by integrating external knowledge, but imperfect retrieval can introduce distracting noise that misleads SLMs. We propose {\name}, a robust RAG framework for SLMs via Margin-aware Preference Optimization. {\name} employs multi-turn prompting for detailed reasoning, rejection sampling for high-quality explanations, and contrastive preference selection to refine responses by maximizing the likelihood gap between preferred and non-preferred outputs.
more » « less
Full Text Available
IPComp: Interpolation Based Progressive Lossy Compression for Scientific Applications

Yang, Z; Di, S; Zhang, L; Li, R; Li, X; Huang, J; Liu, J; Cappello, F; Zhao, K (July 2025, The 34th ACM International Symposium on High-Performance Parallel and Distributed Computing)

Full Text Available
Mitigating Heterogeneous Token Overfitting in LLM Knowledge Editing

Liu, T; Li, R; Dong, Z; Liu, H; Tang, X; Yin, Q; Zhang, L; Wang, H; Gao, J (July 2025, Proceedings of Machine Learning Research)

Large language models (LLMs) have achieved remarkable performance on various natural language tasks. However, they are trained on static corpora and their knowledge can become outdated quickly in the fast-changing world. This motivates the development of knowledge editing (KE) to update specific knowledge in LLMs without changing unrelated others or compromising their pre-trained capabilities. Previous efforts sought to update a small amount of parameters of a LLM and proved effective for making selective updates. Nonetheless, the edited LLM often exhibits degraded ability to reason about the new knowledge. In this work, we identify a key issue: \textit{heterogeneous token overfitting} (HTO), where the LLM overfits different tokens in the provided knowledge at varying rates. To tackle this, we propose {\NAME}, a token-level smoothing method that mitigates HTO by adaptively refining the target distribution. Theoretically, {\NAME} offers better parameter updates with negligible computation overhead. It also induces an implicit DPO but does not require preference data pairs. Extensive experiments across four editing methods, two LLMs, and diverse scenarios demonstrate the effectiveness and versatility of our method.
more » « less
Full Text Available
Order-Independence Without Fine Tuning

McIlroy-Young, R; Brown, K; Olson, C; Zhang, L; Dwork, C (April 2025, ICLR)

The development of generative language models that can create long and coherent textual outputs via autoregression has lead to a proliferation of uses and a corresponding sweep of analyses as researches work to determine the limitations of this new paradigm. Unlike humans, these ‘Large Language Models’ (LLMs) are highly sensitive to small changes in their inputs, leading to unwanted inconsistency in their behavior. One problematic inconsistency when LLMs are used to answer multiple-choice questions or analyze multiple inputs is order dependency: the output of an LLM can (and often does) change significantly when sub-sequences are swapped, despite both orderings being semantically identical. In this paper we present Set-Based Prompting, a technique that guarantees the output of an LLM will not have order dependence on a specified set of sub-sequences. We show that this method provably eliminates order dependency, and that it can be applied to any transformer-based LLM to enable text generation that is unaffected by re-orderings. Delving into the implications of our method, we show that, despite our inputs being out of distribution, the impact on expected accuracy is small, where the expectation is over the order of uniformly chosen shuffling of the candidate responses, and usually significantly less in practice. Thus, Set-Based Prompting can be used as a ‘dropped-in’ method on fully trained models. Finally, we discuss how our method’s success suggests that other strong guarantees can be obtained on LLM performance via modifying the input representations. Code is available at github.com/reidmcy/set-based-prompting.
more » « less
Full Text Available
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models

Xia, P; Zhu, K; Li, H; Wang, T; Shi, W; Wang, S; Zhang, L; Zou, J; Yao, H (April 2025, ICLR)

Artificial Intelligence (AI) has demonstrated significant potential in healthcare, particularly in disease diagnosis and treatment planning. Recent progress in Medical Large Vision-Language Models (Med-LVLMs) has opened up new possibilities for interactive diagnostic tools. However, these models often suffer from factual hallucination, which can lead to incorrect diagnoses. Fine-tuning and retrieval-augmented generation (RAG) have emerged as methods to address these issues. However, the amount of high-quality data and distribution shifts between training data and deployment data limit the application of fine-tuning methods. Although RAG is lightweight and effective, existing RAG-based approaches are not sufficiently general to different medical domains and can potentially cause misalignment issues, both between modalities and between the model and the ground truth. In this paper, we propose a versatile multimodal RAG system, MMed-RAG, designed to enhance the factuality of Med-LVLMs. Our approach introduces a domain-aware retrieval mechanism, an adaptive retrieved contexts selection, and a provable RAG-based preference fine-tuning strategy. These innovations make the RAG process sufficiently general and reliable, significantly improving alignment when introducing retrieved contexts. Experimental results across five medical datasets (involving radiology, ophthalmology, pathology) on medical VQA and report generation demonstrate that MMed-RAG can achieve an average improvement of 43.8% in factual accuracy in the factual accuracy of Med-LVLMs.
more » « less
Full Text Available

« Prev Next »

Search for: All records